Super-resolution Benthic Object Detection¶

In our last notebook, we evaluated MBARI's Montery Bay Bethic Object Detector on the external out TrashCAN dataset. There, we found that model results were good, given the slight adaptations we had to make to compare against the new annotaitons. However, we also saw potential for increased model performance when applying some types of upscaling to the input images.

In this note, we will build a workflow to easily feed the inputs from the TrashCAN dataset through a super-resolution layer before feeding them into the MBARI model. We will then evaluate the performance of the model with and without the super-resolution layer to see how much, if any improvement we can achieve. It will also be important to measure copmutation time and memory usage to see if the tradeoff is worth it.

In later notes, we can explore fine-tuning the set-up built here. This is important to keep in mind when making decisions about how to implement the super-resolution layer.

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
#%pip install -r ../requirements.txt
In [2]:
from fathomnet.models.yolov5 import YOLOv5Model
from IPython.display import display
from pathlib import Path
from PIL import Image
from pycocotools.coco import COCO
from typing import List 

import json
import onnxruntime
import os
import numpy as np
In [3]:
root_dir = Path(os.getcwd().split("personal/")[0])
repo_dir = root_dir / "personal" / "ocean-species-identification"

Load¶

We will start by loading the TrashCAN dataset, the MBARI model, and label map between the two. Aside from path building, each requires only a single line of code to load.

In [4]:
data_dir = root_dir / "data" / "TrashCAN"
benthic_model_weights_path = root_dir / "personal" / "models" / "fathomnet_benthic" / "mbari-mb-benthic-33k.pt"
In [41]:
benthic_model = YOLOv5Model(benthic_model_weights_path)
trashcan_data = COCO(data_dir / "dataset" / "material_version" / "instances_val_trashcan.json")
benthic2trashcan_ids = json.load(open(repo_dir / "data" / "benthic2trashcan_ids.json"))
Using cache found in /Users/per.morten.halvorsen@schibsted.com/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-2-24 Python-3.11.5 torch-2.2.1 CPU

Fusing layers... 
Model summary: 476 layers, 91841704 parameters, 0 gradients
Adding AutoShape... 
loading annotations into memory...
Done (t=0.17s)
creating index...
index created!

Super resolution model¶

  • (simple) feed COCO dataset through super resolution model
  • (simple) feed outputs through MBARI model & show detections
  • (full) build pipeline
In [6]:
onnx_model_path = root_dir / "personal" / "models" / "sr_mobile_python" / "models_modelx4.ort"

Upscale¶

Let's start a single example, to learn the input and output formats.

In [7]:
# reuse some code from preivous notebook ported to src.data
os.chdir(repo_dir)
from src.data import *
In [17]:
starfish_images = images_per_category("animal_starfish", trashcan_data, data_dir / "dataset" / "material_version" / "val")
# starfish_images[0]
In [18]:
for i in range(5):
    print(i)
    example_image = Image.open(starfish_images[i])
    print(np.array(example_image).shape)
    display(example_image)
    print("~~"*40)
0
(360, 480, 3)
No description has been provided for this image
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1
(270, 480, 3)
No description has been provided for this image
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2
(270, 480, 3)
No description has been provided for this image
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3
(270, 480, 3)
No description has been provided for this image
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4
(270, 480, 3)
No description has been provided for this image
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In [19]:
example_image_path = starfish_images[3]
example_image = Image.open(example_image_path)
print(np.array(example_image).shape)
display(example_image)
(270, 480, 3)
No description has been provided for this image

The following methods were adapted from the sr_mobile_python's inference module.

In [20]:
import numpy as np
import cv2
import onnxruntime
from glob import glob
import os
from tqdm.auto import tqdm


def pre_process(img: np.array) -> np.array:
    # H, W, C -> C, H, W
    img = np.transpose(img[:, :, 0:3], (2, 0, 1))
    # C, H, W -> 1, C, H, W
    img = np.expand_dims(img, axis=0).astype(np.float32)
    return img


def post_process(img: np.array) -> np.array:
    # 1, C, H, W -> C, H, W
    img = np.squeeze(img)
    # C, H, W -> H, W, C
    img = np.transpose(img, (1, 2, 0))
    return img


def save(img: np.array, save_name: str) -> None:
    cv2.imwrite(save_name, img)


def inference(model_path: str, img_array: np.array) -> np.array:
    # unasure about ability to train an onnx model from a Mac
    ort_session = onnxruntime.InferenceSession(model_path)
    ort_inputs = {ort_session.get_inputs()[0].name: img_array}
    ort_outs = ort_session.run(None, ort_inputs)

    return ort_outs[0]
In [21]:
def upscale(image_paths, model_path):
    outputs = []

    for image_path in tqdm(image_paths):

        img = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)
        # filename = os.path.basename(image_path)

        if img.ndim == 2:
            img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)

        if img.shape[2] == 4:
            alpha = img[:, :, 3]  # GRAY
            alpha = cv2.cvtColor(alpha, cv2.COLOR_GRAY2BGR)  # BGR
            alpha_output = post_process(
                inference(model_path, pre_process(alpha))
            )  # BGR
            alpha_output = cv2.cvtColor(alpha_output, cv2.COLOR_BGR2GRAY)  # GRAY

            img = img[:, :, 0:3]  # BGR
            image_output = post_process(inference(model_path, pre_process(img)))  # BGR
            output_img = cv2.cvtColor(image_output, cv2.COLOR_BGR2BGRA)  # BGRA
            output_img[:, :, 3] = alpha_output
            # save(output_img, f"{save_path}/{filename}")
        elif img.shape[2] == 3:
            image_output = post_process(inference(model_path, pre_process(img)))  # BGR
            # save(image_output, f"{save_path}/{filename}")
        
        outputs += [image_output.astype('uint8')]
    
    return outputs


example_upscaled = upscale([str(example_image_path)], onnx_model_path)[0]

print(example_upscaled.shape)
Image.fromarray(example_upscaled)
  0%|          | 0/1 [00:00<?, ?it/s]
(1080, 1920, 3)
Out[21]:
No description has been provided for this image
In [23]:
# # reshow the original image for comparison
# Image.fromarray(np.array(example_image))
In [79]:
# check the scale of the super-resolution image
x_scale = example_upscaled.shape[1] / example_image.size[0]
y_scale = example_upscaled.shape[0] / example_image.size[1]

(x_scale, y_scale)
Out[79]:
(4.0, 4.0)

Classify upscaled (single)¶

In [24]:
example_detections = benthic_model._model(example_image)
upscaled_detections = benthic_model._model(example_upscaled)

example_detections.show()
upscaled_detections.show()
No description has been provided for this image
No description has been provided for this image

Here we see what we are trying to acheive with this super-resolution layer.

TODO: Look into why the color is off. The hue seems to be a bit redder in th eupscaled version.

The first few examples had little to no improvement, so we went with index 4 to see when such a pipeline might be useful. This is a form of cherry-picking our results, but mainly for visualization purposes. The final evaluation will fairly compare the methods, without any influence on input data.

Build prediction pipeline¶

In [25]:
onnx_model_path
Out[25]:
PosixPath('/Users/per.morten.halvorsen@schibsted.com/personal/models/sr_mobile_python/models_modelx4.ort')
In [76]:
from fathomnet.models.yolov5 import YOLOv5Model

class YOLOv5ModelWithUpscale(YOLOv5Model):
    def __init__(self, detection_model_path: str,  upscale_model_path: str = None):
        super().__init__(detection_model_path)
        self.upscale_model_path = upscale_model_path


    def forward(self, X: List[str]):
        if self.upscale_model_path:
            X = upscale(X, self.upscale_model_path)
        return self._model(X)


upscale_model = YOLOv5ModelWithUpscale(benthic_model_weights_path, onnx_model_path)

upscaled_detections = upscale_model.forward([str(example_image_path)])  # upscale expects a list of image paths

upscaled_detections.show()
Using cache found in /Users/per.morten.halvorsen@schibsted.com/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-2-24 Python-3.11.5 torch-2.2.1 CPU

Fusing layers... 
Model summary: 476 layers, 91841704 parameters, 0 gradients
Adding AutoShape... 
  0%|          | 0/1 [00:00<?, ?it/s]
No description has been provided for this image

I'll add a somewhat hacky fix here, to make sure our call methods between the two models are the same. This will help standardize our evaluation setup later on.

In [75]:
def forward(self, X: List[str]):
    return self._model(X)

benthic_model.forward = forward.__get__(benthic_model)

example_detections = benthic_model.forward([str(example_image_path)])
example_detections.show()
No description has been provided for this image

Full category classifications¶

As a sanity check, let us see if we can produce predictions for a large number of images. Here, we'll use the "Eel" class, since that category seemed to have fewest images, as observed in the previous notebook.

In [77]:
N = 5

raw_starfish_detections = benthic_model.forward(starfish_images[:N])
upscaled_starfish_detections = upscale_model.forward(starfish_images[:N])  

raw_starfish_detections.show()
upscaled_starfish_detections.show()
  0%|          | 0/5 [00:00<?, ?it/s]
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Great! Now we can easily feed the TrashCAN dataset through the super-resolution model and then through the MBARI model. Let's get the evaluation methods developed in the last notebook and us ethem to compare our models.

Evaluation¶

Our evaluation will contain three main steps:

  1. Import the methods from our previous notebook
  2. Evaluate both the benthic_model and the upscaler_model
  3. Compare the results of the two models

We start by importing the methods from the previous notebook. These methods were ported to stand-alone code, for cleaner imports.

In [28]:
from src.evaluation import *
In [29]:
# rebuild somneeded params locally 
trashcan_ids = {
    row["supercategory"]: id
    for id, row in trashcan_data.cats.items()
}

# find trash index
trash_idx = list(benthic_model._model.names.values()).index("trash")
print(benthic_model._model.names[trash_idx])

# find trash labels 
trashcan_trash_labels = {
    id: name
    for name, id in trashcan_ids.items()
    if name.startswith("trash")
}
trashcan_trash_labels
trash
Out[29]:
{9: 'trash_etc',
 10: 'trash_fabric',
 11: 'trash_fishing_gear',
 12: 'trash_metal',
 13: 'trash_paper',
 14: 'trash_plastic',
 15: 'trash_rubber',
 16: 'trash_wood'}
In [65]:
# replace str keys with ints
benthic2trashcan_ids = {
    int(key): value 
    for key, value in benthic2trashcan_ids.items()
}

Run evaluation on both models¶

In [72]:
raw_starfish_metrics = evaluate_model(
    category="animal_starfish",
    data=trashcan_data,
    model=benthic_model,
    id_map=benthic2trashcan_ids,
    # verbose=2,
    # N=5,
    one_idx=trash_idx,
    many_idx=trashcan_trash_labels,
    exclude_ids=[trashcan_ids["rov"], trashcan_ids["plant"]],
    path_prefix=data_dir / "dataset" / "material_version" / "val"
)

raw_starfish_metrics
Precision: 0.39534882801514354
Recall: 0.08415841542495835
Average IoU: tensor(0.31285)
Out[72]:
{'precision': 0.39534882801514354,
 'recall': 0.08415841542495835,
 'iou': tensor(0.31285),
 'time': 41.07221722602844}
In [81]:
upscale_starfish_metrics = evaluate_model(
    category="animal_starfish",
    data=trashcan_data,
    model=upscale_model,
    id_map=benthic2trashcan_ids,
    # verbose=2,
    # N=5,
    one_idx=trash_idx,
    many_idx=trashcan_trash_labels,
    exclude_ids=[trashcan_ids["rov"], trashcan_ids["plant"]],
    path_prefix=data_dir / "dataset" / "material_version" / "val",
    x_scale=x_scale,
    y_scale=y_scale
)

upscale_starfish_metrics
  0%|          | 0/46 [00:00<?, ?it/s]
Precision: 0.20312499682617194
Recall: 0.06435643532496814
Average IoU: tensor(0.15016)
Out[81]:
{'precision': 0.20312499682617194,
 'recall': 0.06435643532496814,
 'iou': tensor(0.15016),
 'time': 45.43256592750549}

Metrics for all categories¶

In [82]:
def evaluate_both_models(category, N=-1, verbose=False):
    raw_metrics = evaluate_model(
        category=category,
        data=trashcan_data,
        model=benthic_model,
        id_map=benthic2trashcan_ids,
        verbose=verbose,
        N=N,
        one_idx=trash_idx,
        many_idx=trashcan_trash_labels,
        exclude_ids=[trashcan_ids["rov"], trashcan_ids["plant"]],
        path_prefix=data_dir / "dataset" / "material_version" / "val"
    )

    upscale_metrics = evaluate_model(
        category=category,
        data=trashcan_data,
        model=upscale_model,
        id_map=benthic2trashcan_ids,
        verbose=verbose,
        N=N,
        one_idx=trash_idx,
        many_idx=trashcan_trash_labels,
        exclude_ids=[trashcan_ids["rov"], trashcan_ids["plant"]],
        path_prefix=data_dir / "dataset" / "material_version" / "val",
        x_scale=x_scale,
        y_scale=y_scale
    )

    return raw_metrics, upscale_metrics
In [83]:
raw_fish_metrics, upscale_fish_metrics = evaluate_both_models("animal_fish")

print(raw_fish_metrics)
print(upscale_fish_metrics)
  0%|          | 0/100 [00:00<?, ?it/s]
{'precision': 0.4166666608796297, 'recall': 0.11406844063091848, 'iou': tensor(0.32055), 'time': 81.14837098121643}
{'precision': 0.30769229585798863, 'recall': 0.030418250834911596, 'iou': tensor(0.21463), 'time': 83.67938709259033}
In [84]:
raw_eel_metrics, upscale_eel_metrics = evaluate_both_models("animal_eel")

print(raw_eel_metrics)
print(upscale_eel_metrics)
  0%|          | 0/73 [00:00<?, ?it/s]
{'precision': 0.19565216965973545, 'recall': 0.05142857113469388, 'iou': tensor(0.14774), 'time': 48.71544289588928}
{'precision': 0.0, 'recall': 0.0, 'iou': 0.0, 'time': 53.536354064941406}
In [85]:
raw_crab_metrics, upscale_crab_metrics = evaluate_both_models("animal_crab")

print(raw_crab_metrics)
print(upscale_crab_metrics)
  0%|          | 0/39 [00:00<?, ?it/s]
{'precision': 0.07692307573964499, 'recall': 0.03246753225670434, 'iou': tensor(0.06751), 'time': 36.593260049819946}
{'precision': 0.006535947669699688, 'recall': 0.006493506451340867, 'iou': tensor(0.00869), 'time': 38.949223041534424}
In [86]:
raw_trash_metrics, upscale_trash_metrics = evaluate_both_models("trash_plastic")

print(raw_trash_metrics)
print(upscale_fish_metrics)
  0%|          | 0/340 [00:00<?, ?it/s]

Compare results¶

  • table
  • analysis
  • summarized findings

To come ...

Conclusion¶

Wrap things up and make a plan for next steps.